Encoding Shape and Spatial Relations :

نویسندگان

Robert A. Jacobs

Stephen M. Kosslyn

چکیده

An e ective functional architecture facilitates interactions among subsystems that are often used together. Computer simulations showed that di erences in receptive eld sizes can promote such organization. When input was ltered through relatively small nonoverlapping receptive elds, arti cial neural networks learned to categorize shapes relatively quickly; in contrast, when input was ltered through relatively large overlapping receptive elds, networks learned to encode speci c shape exemplars or metric spatial relations relatively quickly. Moreover, when the receptive eld sizes were allowed to adapt during learning, networks developed smaller receptive elds when they were trained to categorize shapes or spatial relations, and developed larger receptive elds when they were trained to encode speci c exemplars or metric distances. In addition, when pairs of networks were constrained to use input from the same type of receptive elds, networks learned a task faster when they were paired with networks that were trained to perform a compatible type of task. Finally, using a novel modular architecture, networks were not pre-assigned a task, but rather competed to perform the di erent tasks. Networks with small nonoverlapping receptive elds tended to win the competition for categorical tasks whereas networks with large overlapping receptive elds tended to win the competition for exemplar/metric tasks. 2 Encoding Shape and Spatial Relations: The Role of Receptive Field Size in Coordinating Complementary Representations The brain is not a single, undi erentiated neural network. Rather, it clearly has a modular structure, and the individual subsystems interact in complex ways (e.g., see Desimone and Ungerleider, 1989; Felleman and Van Essen, 1991). Many Arti cial Intelligence researchers have pointed out the virtues of modular architectures (e.g., Marr, 1982; Simon, 1981), and the modular organization of the brain may re ect the operation of fundamental computational principles. Speci cally, distinct subsystems may have developed to carry out incompatible input/output mappings (see Kosslyn and Koenig, 1992). However, subsystems often are not entirely independent; rather, many may work together to accomplish most tasks. If so, then an e ective computational architecture will \yoke" subsystems that often operate together, facilitating their joint operation. In this article we propose a simple mechanism that will serve to yoke at least some subsystems used in visual perception. Visual perception relies on two major systems, which are localized in di erent parts of the brain (e.g., see Mishkin, Ungerleider, and Macko, 1983). The \dorsal system" runs up from the occipital lobe to the parietal lobe, and encodes spatial properties such as location, orientation, and size. In contrast, the \ventral system" runs from the occipital lobe down to the inferior temporal lobe, and this system encodes object properties such as shape, color and texture (for a review, see Chapter 3 of Kosslyn and Koenig, 1992). One virtue of this architecture is that it allows the ventral system to ignore location in the eld during object 3 recognition while at the same time the dorsal system preserves this information for other purposes (cf. Gross and Mishkin, 1977). These two large systems can be broken down into sets of more specialized subsystems. First, let us consider two more ne-grained subsystems in the dorsal system. Conceptually, there is a clear distinction between categorical spatial relations, such as above/below, left/right, and on/o , and coordinate spatial relations that specify locations in a way that can be used to guide precise movements. Categorical spatial relations group a range of positions and treat them as equivalent; such representations are an essential feature of structural descriptions, which specify the arrangement of an object's parts in a way that applies to all of the object's various shape con gurations. In contrast, metric coordinate spatial relations specify the information that is discarded by categorical relations. Such precise spatial information is essential for reaching and navigation. This is a good example of incompatible input/output mappings: To encode categorical spatial relations, a subsystem must discard the very information that is required to encode coordinate spatial relations. This conceptual distinction suggests that distinct subsystems may encode the two types of spatial relations. One way in which researchers have established the functional distinction between two subsystems is rooted in the logic of a \double dissociation" (Teuber, 1955). In this case, the aim is to demonstrate that one subsystem operates more e ectively in one cerebral hemisphere whereas the other operates more e ectively in the other cerebral hemisphere. If there were only one way to encode spatial relations, either one of the hemispheres would be generally better or there would be no di erence between the hemispheres; hence, if one hemi4 sphere is better at encoding one sort of spatial relation, but the other hemisphere is better at another sort, this is good evidence for the existence of distinct subsystems. Many experiments have now demonstrated that the left cerebral hemisphere encodes categorical spatial relations (above/below, right/left, and on/o ) more e ectively than the right hemisphere (although this e ect is often small in a given experiment), but the right cerebral hemisphere encodes metric coordinate spatial relations more e ectively than the left hemisphere (for a review and meta-analysis, see Kosslyn, Chabris, Marsolek, and Koenig, 1992). Similarly, there is evidence that the ventral system also can be decomposed into at least two encoding subsystems. Conceptually, there is a clear distinction between recognizing a stimulus as a member of a category (e.g., a dog) versus recognizing it as a speci c exemplar (Fido). A category groups various exemplars and treats them as equivalent, whereas identifying an exemplar requires treating the instances as distinct. Again, the two mappings are incompatible: The very information that is needed to specify a speci c example must be ignored to assign it to a category. And again, there is evidence that the cerebral hemispheres are specialized for the di erent types of encoding. For example, Marsolek, Kosslyn, and Squire (1992) asked subjects to read a list of words and to rate the degree to which they liked each word. This was a cover task, intended only to lead the subjects to look at each word. Following this, the subjects saw word stems (e.g., \CAS "), and were asked to complete the stems to form the rst word that came to mind (e.g., \CASTLE"). The stems were presented in the left visual eld (and hence were seen initially by the right cerebral hemisphere) or in the right visual eld (and hence were seen initially by the left cerebral 5 hemisphere). The words were divided into two lists, only half of which were shown to a given subject at the outset; each list was shown to half of the subjects. \Priming" was measured by observing how many words were completed to form the words on the corresponding list when it was seen initially, compared to the number of words that were completed to form words on the other list. More interesting, there was greater priming if the words seen initially were the same typographic case as the stem, and this advantage was totally con ned to the right hemisphere. This result indicates that the right hemisphere represents speci c exemplars better than the left hemisphere. However, some priming still occurred in the left hemisphere, but equivalent amounts of priming occurred when the initial and test cases were in the same or in di erent typographic case. Thus, the left hemisphere could represent information in visual categories. Marsolek (1992) went on to show that the left hemisphere actually is better than the right when prototypes of dot patterns must be abstracted and matched to novel stimuli. Again, then, we nd a double dissociation, which indicates that di erent processes match input to representations of exemplars or visual categories. We were struck by the fact that categorical spatial relations and category representations of shape are encoded more e ectively in the left hemisphere, whereas coordinate spatial relations and exemplar representations of shape are encoded more e ectively in the right cerebral hemisphere. This arrangement makes sense; indeed, the purposes of the di erent types of spatial relations representations can only be accomplished e ectively in the context of the appropriate types of representations of shape. An e ective structural description will generalize to all instances of an object; this requires not only that the arrangement of parts 6 be speci ed in a way that is robust over contortions of the object, but also that the parts themselves be represented in a way that will generalize over the variations in the shapes of parts. Similarly, to reach or navigate e ectively, one not only needs to know how far away an object is, but also needs to know its precise shape; one will move around a table di erently if it is square or circular. It seems clear that an e ective computational architecture will facilitate interaction between the complementary types of processing. But how does such coordination occur? Given the large amount of plasticity in the developing brain (e.g., see Dennis and Kohn, 1975; Dennis and Whitaker, 1976), the subsystems may not be innately con gured in this way. Indeed, Kosslyn, Koenig, Brown, and Gazzaniga (cited in Chapter 9 of Kosslyn and Koenig, 1992) describe a split-brain patient in whom the usual pattern of laterality of spatial relations encoding, as inferred from divided-visualeld studies in normal subjects, appears to be reversed. Kosslyn et al. (1992) showed that a simple property of neural information processing is capable of producing the observed left-hemisphere specialization for categorical spatial relations and the observed right-hemisphere specialization for metric coordinate spatial relations. In their neural network simulations, networks were trained to make a categorical judgment (whether a dot was above or below a bar) or a metric coordinate judgment (whether a dot was within four metric units of the bar). The networks could be trained to encode categorical spatial relations more e ectively when the input units had relatively small, nonoverlapping receptive elds. These small receptive elds apparently helped the network to delineate pock7 ets of space and to specify the spatial relations with respect to these pockets. In contrast, the networks could be trained to encode metric coordinate spatial relations more e ectively when the input units had relatively larger, overlapping receptive elds. These large overlapping elds promoted the use of \coarse coding" to precisely encode the metric relation between a dot and bar. There is much evidence that is consistent with the idea that the left hemisphere monitors the outputs of neurons with relatively small receptive elds whereas the right hemisphere monitors the outputs of neurons with relatively large receptive elds. Van Kleeck (1989) reports a meta-analysis of studies in which subjects are asked to nd a target when they are shown large letters that are made by arranging copies of a small letter. They classify a smaller, component letter faster if the stimulus is presented initially to the left hemisphere, but classify the larger, overall letter faster if the stimulus is presented initially to the right hemisphere. Moreover, patients with damage to the left posterior superior temporal lobe have di culty identifying component letters of hierarchical stimuli, whereas patients with damage to the right posterior superior temporal lobe have di culty identifying the overall pattern formed by the smaller letters (e.g., Robertson and Lamb, 1991; Robertson, Lamb, and Knight, 1991). Similarly, subjects categorize high spatial frequency gratings faster if they are presented to the left hemisphere than to the right, but vice versa for low spatial frequency gratings (e.g., see Christman, Kitterle, and Hellige, 1991; Kitterle and Selig, 1991). Small receptive elds would detect ner variations better than large receptive elds, and vice versa for larger spatial variations. 8 The present research was designed to test the hypothesis that di erences in the sizes of receptive elds can coordinate complementary representations of shape and spatial relations. We hypothesized that relatively small, nonoverlapping receptive elds promote the development of both categorical spatial relations representations and representations of shape categories, whereas relatively large, overlapping receptive elds promote the development of both metric spatial relations representations and representations of individual shapes. General Method We used four tasks in the experiments reported in this article. Two tasks required a system to make categorical judgments (these tasks are labeled cat) and two tasks required coordinate judgments (these tasks are labeled coo). Within these four tasks, two involved judging the spatial relation of one shape with respect to another (these tasks are labeled where) and two involved judging the identity of a shape (these tasks are labeled what). Speci cally, a network speci ed whether a shape was above or below a horizontal bar when performing the where/cat task, and speci ed whether a shape was within two metric units of the bar or whether it was more than two units away when performing the where/coo task. The what/cat task required a network to specify the category of a shape, and the what/coo task required it to specify a shape's individual identity. Each shape category was de ned by a prototype, which was a pattern formed in a 5X5 grid. By selecting one pixel of a category's prototype and perturbing that pixel one unit 9

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Does a causal relation exist between the functional hemispheric asymmetries of visual processing subsystems?

Past research indicates that specific shape recognition and spatial-relations encoding rely on subsystems that exhibit right-hemisphere advantages, whereas abstract shape recognition and spatial-relations encoding rely on subsystems that exhibit left-hemisphere advantages. Given these apparent regularities, we tested whether asymmetries in shape processing are causally related to asymmetries in...

متن کامل

Visual Learning of Statistical Relations Among Non-adjacent Features: Evidence for Structural Encoding.

Recent results suggest that observers can learn, unsupervised, the co-occurrence of independent shape features in viewed patterns (e.g., Fiser & Aslin, 2001). A critical question with regard to these findings is whether learning is driven by a structural, rule-based encoding of spatial relations between distinct features or by a pictorial, template-like encoding, in which spatial configurations...

متن کامل

Unsupervised Grounding of Spatial Relations

We present an unsupervised connectionist model for grounding color, shape and spatial relations of two objects in 2D space. The model constitutes a two-layer architecture that integrates information from visual and auditory inputs. The images are presented as the visual inputs to an artificial retina and fiveword sentences describing them (e.g. “Red box above green circle”) serve as auditory in...

متن کامل

Neural Network Models as Evidence for Different Types of Visual Representations

Cook (1995) criticizes the work of Jacobs ond Kosslyn (1994) on spatial relations, shape representations, and receptive fields in neural network models on the grounds that first-order correlations between input and output unit activities con explain the results. We reply briefly ta Cook’s orguments here (ond in Kosslyn, Chabris, Morsolek, Jacobs, & Koenig, 1995) and discuss how new simulations ...

متن کامل

Independent representation of parts and the relations between them: evidence from integrative agnosia.

Whether objects are represented as a collection of parts whose relations are coded independently remains a topic of ongoing discussion among theorists in the domain of shape perception. S. M., an individual with integrative agnosia, and neurologically intact ("normal") individuals learned initially to identify 4 target objects constructed of 2 simple volumetric parts. At test, the targets were ...

متن کامل